Summary

SAD Score on selected models

Base vs Chat Models

Effect of SP by Model

Effect of SP by Task

Correlations between SAD and MMLU

Correlations between SAD, SAD Categories and MMLU


Facts

Facts Category

Human Defaults

LLMs

Which LLM

Names


Influence

Influence Category


Introspection

Introspection Category

Count Tokens

Predict Words

Rules


Stages

Stages Category

Stages Full

Stages Oversight


Self Recognition

Self Recognition Category

Who

Groups


ID Leverage

ID Leverage Category

Entity Name

Multihop


Anti Imitation

Anti Imitation Category

Output Control

Do Not Imitate